Lesson 7: Files, Databases, and Pickles
Persistence
So far, we have learned how to write programs and communicate our intentions to the Central Processing Unit using conditional execution, functions, and iterations. We have learned how to create and use data structures in the Main Memory. The CPU and memory are where our software works and runs. It is where all of the "thinking" happens.
But once the power is turned off, anything stored in either the CPU or main memory is erased. So up to now, our programs have just been transient fun exercises to learn Python.
In this lesson, we start to work with Secondary Memory. Secondary memory is not erased even when the power is turned off. Or in the case of a USB flash drive, the data we write from our programs can be removed from the system and transported to another system.
These programs are persistent: they run for a long time (or all the time); they keep at least some of their data in permanent storage (a hard drive, for example); and if they shut down and restart, they pick up where they left off.
Examples of persistent programs are operating systems, which run pretty much whenever a computer is on, and web servers, which run all the time, waiting for requests to come in on the network.
One of the simplest ways for programs to maintain their data is by reading and writing text files.
An alternative is to store the state of the program in a database. In this
lesson I will present a simple database and a module, pickle
, that
makes it easy to store program data.
We will primarily focus on reading and writing text files such as those we create in a text editor.
Files
A text file is a sequence of characters stored on a permanent medium like a hard drive, flash memory, or CD - ROM.
First Thing's First
For the examples in this lesson we need few files.
The first one is called words.txt
and it is a list of 113,809
official crosswords; that is, words that are considered valid in crossword
puzzles and other word games. This is part of the Moby lexicon project
(see http://wikipedia.org/wiki/Moby_Project).
The second one is a list of emails from an open source coding project, called
mbox.txt
.
The third is from Act 2, Scene 2 of Romeo and Juliet, called
romeo-full.txt
.
You can download them here:
For ease, you will want to save these files in the same folder that you are in when you start Python. To find this folder, open IDLE and then go to File > Save As.
The default folder which displays should be where you save this file.
In the above example, I want to save the files in the
\AppData\Local\Programs\Python\Python35-32\
folder.
Opening Files
When we want to read or write a file, we first must open the file. Opening the file communicates with your operating system, which knows where the data for each file is stored. When you open a file, you are asking the operating system to find the file by name and make sure the file exists.
This file is in plain text, so you can open it with a text editor, but you
can also read it from Python. The built - in function open
takes
the name of the file as a parameter and returns a file object you can use to
read the file.
fin = open('words.txt')
fin
is a common name for a file object used for input.
If we display the value of fin, we get this:
Code | Output |
---|---|
fin = open('words.txt') |
<_io.TextIOWrapper name='words.txt' mode='r'
encoding='cp1252'> |
If the open
is successful, the operating system returns us a
file handle. The file handle is not the actual data
contained in the file, but instead it is a "handle" that we can use to read the
data. You are given a handle if the requested file exists and you have the
proper permissions to read the file.
If the file does not exist, open will fail with a traceback
and
you will not get a handle to access the contents of the file:
Code | Output |
---|---|
fin = open('stuff.txt') |
FileNotFoundError: [Errno 2] No such file or directory:
stuff.txt' |
Later we will use try and except to deal more gracefully with the situation where we attempt to open a file that does not exist.